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FPGA LOOKUP TABLE WITH TRANSMISSION GATE STRUCTURE 
FOR RELIABLE LOW-VOLTAGE OPERATION 



FTF.TiD OF T HE INVENTION 

[0001] The invention relates to Field Programmable Gate 
Arrays (FPGAs) . More particularly, the invention relates to 
a lookup table for an FPGA that is designed for reliable low- 
voltage operation. 

BACKGROUND OF THF INVENTION 

[0002] Programmable logic devices (PLDs) are a well-known 
type of digital integrated circuit that can be programmed to 
perform specified logic functions. One type of PLD, the 
field programmable gate array (FPGA) , typically includes an 
array of configurable logic blocks (CLBs) surrounded by a 
ring of programmable input /output blocks (lOBs) . The CLBs 
and lOBs are interconnected by a programmable interconnect 
structure. Some FPGAs also include additional logic blocks 
with special purposes (e.g., DLLs, RAM, and so forth). 
[0003] The CLBs, IOBs, interconnect, and other logic 
blocks are typically programmed by loading a stream of 
configuration data (bitstream) into internal configuration 
memory cells that define how the CLBs, IOBs, and interconnect 
are configured. The configuration data can be read from 
memory (e.g., an external PROM) or written into the FPGA by 
an external device. The collective states of the individual 
memory cells then determine the function of the FPGA. 
[0004] A CLB typically includes at least two types of sub- 
circuits, with supporting logic. One sub-circuit type is the 
register element, which can be, for example, a flip-flop 
configurably programmable as a latch. The other common sub- 
circuit is a function generator, often a 4-input function 
generator that can provide any function of up to four input 
signals. The function generator is typically implemented as 
a lookup table (LUT) , often a static RAM (SRAM) . 
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[0005] For example, a 4-input LUT is typically implemented 
using a 16x1 SRAM. The SRAM is programmed (written to) 
during the configuration of the FPGA, using values included 
in the configuration bitstream. There are 16 possible 
combinations of the four input signals, so each of the 16 
memory locations in the lookup table is programmed with the 
correct output value for the corresponding four input values. 
The four input values provide the four address bits for the 
16x1 SRAM. 

[0006] One FPGA, the Xilinx Virtex®-II FPGA, is described 
in detail in pages 33-75 of the "Virtex-II Platform FPGA 
Handbook", published December, 2000, available from Xilinx, 
inc., 2100 Logic Drive, San Jose, California 95124, which 
pages are incorporated herein by reference. Fig. 1 is a 
simplified block diagram of a Virtex-II CLB. 
[0007] CLB 100 includes four "slices" SLICE_0-3, each 
slice including the logic shown in Fig. 1 for SLICE_0. 
(Other logic in the slice not relevant to the present 
application is omitted from Fig. 1, for clarity.) Each slice 
includes two LUTs 101-102. Each LUT can be programmed to 
function as any of a 4-input lookup table, a 16-bit shift 
register, and 16 bits of random access memory (RAM) in any of 
several configurations. When the LUTs are configured to 
function as RAM, a write strobe generator circuit 105 is 
active, and controls the write functions of the RAM. Each 
LUT 101-102 has two output signals OUTl and OUT2 . (In the 
present specification, the same reference characters are used 
to refer to terminals, signal lines, and their corresponding 
signals.) Both output signals OUT1-OUT2 have the same value; 
the output value is provided in duplicate merely to speed up 
the output path for each output signal. 

[0008] Multiplexer MUXl passes either the first output 
OUTl of function generator 101 or an independent input signal 
Reg_DI_l to 1-bit register 103. Register 103 can be 
configured as either a flip-flop or a latch. The outputs of 



2 



X-1128-1D US 



PATENT 



LUT 101 and register 103 are both optionally provided as 
outputs of the slice (labeled Dl and Ql, respectively, in 
Fig. 1). Thus, the LUT and register can be used 
independently of each other or can be coupled together so the 
register stores the LUT output signal. 

[0009] The second LUT output, OUT2, is optionally used to 
control the carry logic within the half -slice. LUT output 
signal OUT2 is coupled to the select terminal of carry 
multiplexer CMl, and selects one of the previous carry-out 
signal and a new input signal to place on the carry out 
terminal COUT. 

[0010] The elements in the other half of the slice, 
including LUT 102, multiplexer MUX2, carry multiplexer CM2, 
and 1-bit register 104, are coupled together in a similar 
manner . 

[0011] Fig. 2 shows the internal structure of the LUT 
included in the Virtex-II FPGA, i.e., LUTs 101 and 102 of 
Fig. 1. Again, extraneous logic is omitted from the drawing, 
for clarity. For example, the configuration logic used to 
load initial values into RAM cells RB201-RB216 is not shown. 
This logic and other omitted circuitry is well known in the 
art of FPGA design. 

[0012] The Virtex-II LUT (101a in Fig. 2) includes 16 
memory cells RB201-RB216. These memory cells are used to 
store the 16 possible output values for the four input 
signals IN1-IN4. Memory cells RB201-RB216 are accessed via 
several control and data signals. For example, signals 
CTRL/DATA1 access each memory cell, and include configuration 
control signals, write control signals (such as write strobe 
signal WS shown in Fig. 1), a direct data input signal (e.g., 
signals RAM_DI_1, RAM_DI_2 in Fig. 1), an initialization 
control signal, and so forth. Signals CTRL/ DAT A2 pass 
serially through each memory cell, and include configuration 
input data, a serial line used when the LUT is configured as 
a serial register, and so forth. 
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[0013] Each memory cell RB201-RB216 provides one output 
signal, of which one must be selected. The 16 output signals 
are reduced to four, first by eliminating half of the signals 
using input signal INI, then by eliminating another half of , 
the signals using input signal IN2 . For example, the output 
of memory cell RB201 passes through N-channel transistor 211 
whenever signal INI is high, while the output of memory cell 
RB202 passes through N-channel transistor 212 whenever input 
signal INl is low (i.e., the output of inverter INVl is 
high) . The selected one of these two output signals passes 
through N-channel transistor 231 whenever signal IN2 is high. 
[0014] Similarly, the output of memory cell RB203 passes 
through N-channel transistor 213 whenever signal INl is high, 
while the output of memory cell RB204 passes through N- 
channel transistor 214 whenever input signal INl is low 
(i.e., the output of inverter INVl is high). The selected 
one of these two output signals passes through N-channel 
transistor 232 whenever signal IN2 is low (i.e., the output 
of inverter INV2 is high). Thus, the output of one of memory 
cells RB201-RB204 is passed to node A, based on the values of 
signals INl and IN2 . 

[0015] Similarly, the output of one of memory cells RB205- 
RB208 is passed to node B, the output of one of memory cells 
RB209-RB212 is passed to node C, and the output of one of 
memory cells RB213-RB216 is passed to node D, also based on 
the values of signals INl and IN2 . 

[0016] Coupled to each of nodes A-D is a pull-up (241-244, 
respectively) implemented as a P-channel transistor coupled 
between the node and power high VDD. The pull-up is 
controlled by power-on reset signal PORB. During a power-on 
or reset sequence signal PORB is low, forcing each of nodes 
A-D to a high value and ensuring thereby that the LUT output 
signals OUT1-OUT2 are high after a power-on or reset 
sequence . 

[0017] Node A then passes through a half-latch 245 to node 
E. Half-latch 245 includes an inverter 251 that buffers (and 
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inverts) the signal on node A. However, a limitation of the 
circuit of Fig. 2 now comes into play. This limitation is 
inherent in the properties of N-channel transistors, i.e., 
that a high voltage level passing through an N-channel 
transistor is reduced by one threshold voltage of the 
transistor. Therefore, to ensure that node A reaches a true 
••high" level (i.e., reaches power high VDD when the node is 
high) , a second pull-up 261 is included, forming half-latch 
245. When node A is high, inverter 251 drives a low value, 
which turns on pull-up (P-channel transistor) 261. Thus, 
node A is pulled all the way to VDD, ensuring a reliable 
value on node A and hence on node E. 

[0018] Similarly, half-latch 246 is provided between nodes' 
B and F, half-latch 247 is provided between nodes C and G, 
and half -latch 248 is provided between nodes D and H. 
[0019] The 16 outputs from memory cells RB201-RB216 have 
now been reduced to four signals on nodes E-H. Signal IN3 is 
now used to select one of signals E and F and pass the 
selected signal to node J, and to select one of signals G and 
H and pass the selected signal to node K. 

[0020] Each of two logically identical output circuits now 
selects one of the two nodes J and K based on the value of 
signal IN4, and passes the selected signal to a half -latch 
and thence to the corresponding LUT output terminal. As 
described above in relation to Fig. 1, the LUT has two 
logically identical output signals OUTl and OUT2, a 
configuration that enhances the performance of the CLB. 
[0021] The first output circuit includes N-channel 
transistors 281, 283 and half-latch 291, and provides output 
signal OUTl to the direct output Dl and multiplexer MUXl of 
the CLB in Fig. 1. When signal IN4 is high, the signal on 
node J is passed through transistor 281 to node L and hence 
to half -latch 291 and output node OUTl. When signal IN4 is 
low (i.e., the output of inverter INV4 is high), the signal 
on node K is passed through transistor 283 to node L and 
hence to half -latch 291 and output node OUTl. 
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[0022] Similarly, the second output circuit includes N- 
channel transistors 282, 284 and half -latch 292, and provides 
output signal OUT2 to carry multiplexer CMl of the CLB in 
Fig. 1. When signal IN4 is high, the signal on node J is 
passed through transistor 282 to node M and hence to half- 
latch 292 and output node 0UT2. When signal IN4 is low 
(i.e., the output of inverter INV4 is high), the signal on 
node K is passed through transistor 284 to node M and hence 
to half-latch 292 and output node 0UT2 . 

[0023] Note that half-latches are again required on the 
output signals to ensure reliable values on the output 
terminals OUTl and OUT2 . 

[0024] By passing the memory cell output signals through a 
series of N-channel transistors and half -latches , a reliable ' 
circuit is provided that has the advantage of being 
relatively small. In other words, it uses a small number of 
transistors for the function performed, and it uses largely 
N-channel transistors, which are smaller than P-channel 
transistors designed to operate under the same conditions. 
Traditionally, small size is an important goal when designing 
memory arrays such as LUTs, and particularly so in FPGAs 
where hundreds or even thousands of copies of the LUT can be 
included in each device. 

[0025] The LUT structure of Fig. 2 works well at present 
operating voltage levels, e.g., at 1.5 volts. However, FPGA 
operating voltages are consistently being reduced. A lower 
operating voltage offers the advantage of reduced power 
consumption. Further, lower operating voltages are required 
for the shorter gate length fabrication processes now being 
developed. Therefore, circuits in FPGAs being designed today 
will operate at even lower voltage levels, e.g., 1.2 volts. 
[0026] As described above in relation to Fig. 2, a high 
voltage level passing through an N-channel transistor is 
reduced by one threshold voltage of the transistor (Vth) . 
When the power high voltage level VDD is much greater than 
Vth, this limitation can be easily overcome, e.g., by the use 
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of half-latches as in the LUT of Fig. 2. However, as VDD 
drops closer to Vth, this type of corrective measure is no 
longer adequate. 

[0027] Therefore, it is desirable to provide a LUT 
structure for an FPGA that can reliably perform at an 
operating voltage closer to the threshold voltage level of an 
N-channel transistor than is possible with known LUT 
structures . 

ST7MMARY <">F THE IN VENTION 

[0028] The invention provides a lookup table (LUT) for a 
field programmable gate array (FPGA) that is designed to 
operate reliably at low voltage levels. A LUT designed 
according to the invention includes no unpaired N-channel 
pass gates. Instead, CMOS pass gates are used, which include 
paired N- and P-channel transistors. Unlike an N-channel 
transistor, a CMOS pass gate can pass either a high signal or 
a low signal with no degradation in the voltage level of the 
input signal. 

[0029] The described implementation is counter- intuitive, 
because of the significant increase in gate count compared to 
existing lookup tables. However, this disadvantage is 
mitigated in some embodiments by removing the half-latches 
required in current designs. In some embodiments, the 
circuit is also reduced in size by removing initialization 
circuitry that is rendered unnecessary by the removal of the 
N-channel pass gates. 

[0030] According to one embodiment, the invention provides 
a LUT in an FPGA configurable with a configuration bitstream. 
The LUT includes N LUT input terminals, where N is an 
integer; N inverters coupled to the LUT input terminals; a 
LUT output terminal; a plurality of memory cells storing 
values from the configuration bitstream; and a plurality of 
CMOS pass gates coupled between the output terminals of each 
memory cell and the LUT output terminal. A path between each 
memory cell and the LUT output terminal traverses N of the 
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CMOS pass gates. Each CMOS pass gate on a given path has a 
first gate terminal coupled to a different one of the LUT 
input terminals and a second gate terminal coupled to the 
output terminal of the associated inverter. 
[0031] In some embodiments, the first gate terminal of 
each CMOS pass gate is an N-terminal and the second gate 
terminal is a P- terminal. In some embodiments, N is four, 
and some embodiments include 16 memory cells. Some 
embodiments include one or two inverters coupled to the LUT 
output terminal . 

[0032] Some embodiments include an additional CMOS pass 
gate on each path between a memory cell and the LUT output 
terminal, M additional LUT input terminals, where M is an 
integer, and a decoder circuit. The decoder circuit has 
input terminals coupled to the M additional LUT input 
terminals and output terminals coupled to the gate terminals , 
of the additional CMOS pass gates. The decoder circuit 
decodes the M input signals, then provides decoded output 
signals that can efficiently be used to select a LUT output 
signal. In one embodiment where N is two, M is also two. 
One such embodiment includes 16 memory cells. 
[0033] The presence of the decoder circuit reduces the 
number of CMOS pass gates on the path through the LUT, at the 
cost of an increased delay on the LUT data input signals 
provided to the decoder. The increased delay on these input 
paths can be mitigated if the FPGA implementation software is 
designed to assign less speed-critical signals to the slower 
data input terminals. 

[0034] Another embodiment of the invention is directed to 
a configurable logic block (CLB) in an FPGA, the CLB 
including at least one LUT substantially as described above. 1 

RRTEF PK.qrR-TPTTON OF TH F. DRAWINGS 

[0035] The present invention is illustrated by way of 
example, and not by way of limitation, in the following 
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figures, in which like reference numerals refer to similar 
elements . 

[0036] Fig. 1 is a block diagram of a configurable logic 
block (CLB) from a Xilinx Virtex-II FPGA. 

[0037] Fig. 2 shows a lookup table (LUT) from the Virtex- 
II CLB of Fig. 1. 

[0038] Fig. 3 shows a first lookup table that can be used 
with the CLB of Fig. 1, designed for use at a low operating 
voltage according to a first embodiment of the present 
invention . 

[0039] Fig. 4 shows a second lookup table that can be used 
with the CLB of Fig. 1, designed for use at a low operating 
voltage according to a second embodiment of the present 
invention . 

DETAILED DFSCRIPTTON OF TF F. DRAWINGS 

[0040] In the following description, numerous specific 
details are set forth to provide a more thorough 
understanding of the present invention. However, it will be 
apparent to one skilled in the art that the present invention 
can be practiced without these specific details. For 
example, the examples provided show 4-input lookup tables 
(LUTs); however, the principles of the invention can also be 
applied to LUTs of other sizes. As another example, while 
the LUTs of the present invention are designed to operate 
reliably at low voltage levels, they can also be used at 
standard voltage levels or higher voltage levels. Therefore, 
the scope of the present invention is not limited by the 
design considerations that originally motivated the 
invention. 

[0041] Fig. 3 shows a first low-voltage LUT according to 
one embodiment of the invention. LUT 101b can be used, for 
example, in the CLB of Fig. 1. LUT 101b includes 16 memory 
cells RB301-316, 30 CMOS pass gates 311-326, 331-338, 351- 
354, and 361-362, and 10 inverters 341-344, 371-372, and 
INV1-INV4. 
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[0042] Each memory cell RB301-RB316 provides one output 
signal, of which one must be selected. The memory cells can 
be the same, for example, as memory cells RB201-RB216 of Fig. 
2. The 16 memory cell output signals are reduced to four, 
first by eliminating half of the signals using input signal 
INI, then by eliminating another half of the signals using 
input signal IN2 . The portion of the circuit that performs 
these tasks is similar to that of LUT 101a of Fig. 2, except 
that the N-channel transistors used as pass gates in LUT 101a 
have been replaced by CMOS pass gates. The N-terminal of 
each CMOS pass gate is coupled to the same signal as the gate 
terminal of the corresponding N-channel transistor in LUT 
101a. The P- terminal of each CMOS pass gate is coupled to 
the inverse of that signal. 

[0043] Therefore, the signal on node P in Fig. 3 is 
similar to the signal on node A in Fig. 2, but with an 
important difference. When a high value is passed to node A, 
the signal is attenuated by having passed through one or more 
N-channel transistors. In other words, when the selected 
memory cell output signal is at power high (VDD) , the high 
signal at node A has a voltage level of VDD-Vth, or VDD minus 
the threshold voltage level of an N-channel transistor. On 
the other hand, the voltage level at node P is still at 
voltage level VDD. 

[0044] This difference has several significant 
implications. First, LUT 101b can operate at a lower VDD 
level than LUT 101a. Second, half -latches were necessary at 
nodes A-D in Fig. 2 to ensure a true "high" value on the 
node. These half latches are not necessary at nodes P-S in 
Fig. 3, because a high value on the nodes is already at a VDD 
voltage level. Third, pull-ups 241-244 were included in LUT 
101a to ensure a high value on nodes A-D (and subsequently on 
the LUT output terminals 0UT1-0UT2) after a power-on or reset 
sequence. These pull-ups are not necessary in the embodiment 
of Fig. 3. The reason is that after a power-on or reset 
sequence, the output signal from each memory cell is high. 
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No matter which memory cell is selected, the value at each of 
nodes P-S is high, and, because of the CMOS pass gates, that 
value is at a VDD voltage level. 

[0045] Because half-latches are not needed in LUT 101b, 
each of nodes P-S drives an inverter (341-344, respectively), 
and the output of each inverter passes through another CMOS 
pass gate (351-354, respectively). The inverted value from 
node P passes through CMOS pass gate 351 whenever signal IN3 
is high, while the inverted value from node Q passes through 
CMOS pass gate 352 whenever signal IN3 is low (i.e., the 
output of inverter INV3 is high) . The selected one of these 
two output signals passes through CMOS pass gate 361 to node , 
V whenever signal IN4 is high. 

[0046] Similarly, the inverted value from node R passes 
through CMOS pass gate 353 whenever signal IN3 is high, while 
the inverted value from node S passes through CMOS pass gate 
354 whenever signal IN3 is low (i.e., the output of inverter 
INV3 is high) . The selected one of these two output signals 
passes through CMOS pass gate 362 to node V whenever signal 
IN4 is low. 

[0047] Two inverters 371-372 are provided to generate the 
two LUT output signals OUT1-OUT2 from the signal on node V. 
In other embodiments, only one output signal is provided. 
[0048] Note that no half-latches are required on any of 
the nodes in LUT 101b to ensure reliable values on the output 
terminals OUTl and OUT2 . 

[0049] By replacing the N-channel transistors of Fig. 2 
with CMOS pass gates, a LUT is provided in Fig. 3 that has 
the advantage of operating correctly at a relatively low 
voltage level. However, clearly many more transistors are 
required than in the implementation of Fig. 2. Aside from 
the 16 memory cells, the known implementation of Fig. 2 uses 
62 transistors to implement a 4-input LUT, most of them N- 
channel transistors (which, as noted above, are smaller than ■ 
p-channel transistors designed to operate under the same 
conditions). To perform the same function, the novel 

11 



X-1128-1D US 



PATENT 1 



implementation of Fig. 3 includes 80 transistors, half of 
them P-channel transistors. The disadvantages of this 
increased transistor count are obvious. However, the 
advantage of low-voltage operation is sufficient to make the 
implementation of Fig. 3 advantageous for many applications. 
[0050] Fig. 4 shows another embodiment, this embodiment 
using 100 transistors (and 16 memory cells) to implement a 4- 
input LUT. Again, half of these transistors are P-channel 
transistors. However, LUT 101c of Fig. 4 has further 
advantages in addition to supporting low-voltage operation. 
The embodiment of Fig. 4 is similar to that of Fig. 3, except 
that two of the input signals (IN3 and IN4) are decoded 
before being used to select among the memory cell output 
signals . 

[0051] The leftmost portion of LUT 101c is the same as 
that of LUT 101b of Fig. 3. In other words, the circuits 
from the memory cells through nodes P-S are the same in Figs. 
3 and 4. Also, each of nodes P-S drives an inverter (441- 
444, respectively, in LUT 101c), which in turn provides a 
signal to a CMOS pass gate (451-454, respectively, in LUT 
101c) . However, CMOS pass gates 451-454 are controlled by a 
decoder circuit comprising NAND gates 481-484 and inverters 
INV3-INV4. Input signals IN3 and IN4 are decoded by the 
decoder circuit, such that only one of NAND gates 481-484 
provides a high value at any given time. The NAND gate 
providing the high value selects one of nodes P-S to provide 
an inverted value to node W. 

[0052] Two inverters 471-472 are provided to generate the = 
two LUT output signals 0UT1-0UT2 from the signal on node W. 
In other embodiments, only one output signal is provided. 
[0053] Those having skill in the relevant arts of the 
invention will now perceive various modifications and 
additions that can be made as a result of the disclosure 
herein. For example, memory cells, registers, transistors, 
N-channel transistors, P-channel transistors, CMOS pass 
gates, inverters, NAND gates, FPGAs, CLBs, multiplexers, 
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decoder circuits, decoders, and other components other than 
those described herein can be used to implement the 
invention. Active-high signals can be replaced with active- 
low signals by making straightforward alterations to the 
circuitry, such as are well known in the art of circuit 
design. 

[0054] Moreover, some components are shown directly 
connected to one another while others are shown connected via 
intermediate components. In each instance the method of 
interconnection establishes some desired electrical 
communication between two or more circuit nodes. Such 
communication may often be accomplished using a number of 
circuit configurations, as will be understood by those of 
skill in the art. 

[0055] Accordingly, all such modifications and additions 
are deemed to be within the scope of the invention, which is 
to be limited only by the appended claims and their 
equivalents . 
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